17 research outputs found

    The human RBPome: From genes and proteins to human disease

    Get PDF
    RNA binding proteins (RBPs) play a central role in mediating post transcriptional regulation of genes. However less is understood about them and their regulatory mechanisms. In this study, we construct a catalogue of 1344 experimentally confirmed RBPs. The domain architecture of RBPs enabled us to classify them into three groups — Classical (29%), Non-classical (19%) and unclassified (52%). A higher percentage of proteins with unclassified domains reveals the presence of various uncharacterised motifs that can potentially bind RNA. RBPs were found to be highly disordered compared to Non-RBPs (p < 2.2e-16, Fisher's exact test), suggestive of a dynamic regulatory role of RBPs in cellular signalling and homeostasis. Evolutionary analysis in 62 different species showed that RBPs are highly conserved compared to Non-RBPs (p < 2.2e-16, Wilcox-test), reflecting the conservation of various biological processes like mRNA splicing and ribosome biogenesis. The expression patterns of RBPs from human proteome map revealed that ~ 40% of them are ubiquitously expressed and ~ 60% are tissue-specific. RBPs were also seen to be highly associated with several neurological disorders, cancer and inflammatory diseases. Anatomical contexts like B cells, T-cells, foetal liver and foetal brain were found to be strongly enriched for RBPs, implying a prominent role of RBPs in immune responses and different developmental stages. The catalogue and meta-analysis presented here should form a foundation for furthering our understanding of RBPs and the cellular networks they control, in years to come. This article is part of a Special Issue entitled: Proteomics in India

    ExSurv: A Web Resource for Prognostic Analyses of Exons Across Human Cancers Using Clinical Transcriptomes

    Get PDF
    Survival analysis in biomedical sciences is generally performed by correlating the levels of cellular components with patients' clinical features as a common practice in prognostic biomarker discovery. While the common and primary focus of such analysis in cancer genomics so far has been to identify the potential prognostic genes, alternative splicing - a posttranscriptional regulatory mechanism that affects the functional form of a protein due to inclusion or exclusion of individual exons giving rise to alternative protein products, has increasingly gained attention due to the prevalence of splicing aberrations in cancer transcriptomes. Hence, uncovering the potential prognostic exons can not only help in rationally designing exon-specific therapeutics but also increase specificity toward more personalized treatment options. To address this gap and to provide a platform for rational identification of prognostic exons from cancer transcriptomes, we developed ExSurv (https://exsurv.soic.iupui.edu), a web-based platform for predicting the survival contribution of all annotated exons in the human genome using RNA sequencing-based expression profiles for cancer samples from four cancer types available from The Cancer Genome Atlas. ExSurv enables users to search for a gene of interest and shows survival probabilities for all the exons associated with a gene and found to be significant at the chosen threshold. ExSurv also includes raw expression values across the cancer cohort as well as the survival plots for prognostic exons. Our analysis of the resulting prognostic exons across four cancer types revealed that most of the survival-associated exons are unique to a cancer type with few processes such as cell adhesion, carboxylic, fatty acid metabolism, and regulation of T-cell signaling common across cancer types, possibly suggesting significant differences in the posttranscriptional regulatory pathways contributing to prognosis

    A framework for identifying genotypic information from clinical records: exploiting integrated ontology structures to transfer annotations between ICD codes and Gene Ontologies

    Get PDF
    Although some methods are proposed for automatic ontology generation, none of them address the issue of integrating large-scale heterogeneous biomedical ontologies. We propose a novel approach for integrating various types of ontologies efficiently and apply it to integrate International Classification of Diseases, Ninth Revision, Clinical Modification (ICD9CM) and Gene Ontologies (GO). This approach is one of the early attempts to quantify the associations among clinical terms (e.g. ICD9 codes) based on their corresponding genomic relationships. We reconstructed a merged tree for a partial set of GO and ICD9 codes and measured the performance of this tree in terms of associations’ relevance by comparing them with two well-known disease-gene datasets (i.e. MalaCards and Disease Ontology). Furthermore, we compared the genomic-based ICD9 associations to temporal relationships between them from electronic health records. Our analysis shows promising associations supported by both comparisons suggesting a high reliability. We also manually analyzed several significant associations and found promising support from literature

    RNA Editing in Pathogenesis of Cancer

    Get PDF
    Several adenosine or cytidine deaminase enzymes deaminate transcript sequences in a cell type or environment-dependent manner by a programmed process called RNA editing. RNA editing enzymes catalyze A>I or C>U transcript alterations and have the potential to change protein coding sequences. In this brief review, we highlight some recent work that shows aberrant patterns of RNA editing in cancer. Transcriptome sequencing studies reveal increased or decreased global RNA editing levels depending on the tumor type. Altered RNA editing in cancer cells may provide a selective advantage for tumor growth and resistance to apoptosis. RNA editing may promote cancer by dynamically recoding oncogenic genes, regulating oncogenic gene expression by noncoding RNA and miRNA editing, or by transcriptome scale changes in RNA editing levels that may affect innate immune signaling. Although RNA editing markedly increases complexity of the cancer cell transcriptomes, cancer-specific recoding RNA editing events have yet to be discovered. Epitranscriptomic changes by RNA editing in cancer represent a novel mechanism contributing to sequence diversity independently of DNA mutations. Therefore, RNA editing studies should complement genome sequence data to understand the full impact of nucleic acid sequence alterations in cancer

    SliceIt: A genome-wide resource and visualization tool to design CRISPR/Cas9 screens for editing protein-RNA interaction sites in the human genome

    Get PDF
    Several protein-RNA cross linking protocols have been established in recent years to delineate the molecular interaction of an RNA Binding Protein (RBP) and its target RNAs. However, functional dissection of the role of the RBP binding sites in modulating the post-transcriptional fate of the target RNA remains challenging. CRISPR/Cas9 genome editing system is being commonly employed to perturb both coding and noncoding regions in the genome. With the advancements in genome-scale CRISPR/Cas9 screens, it is now possible to not only perturb specific binding sites but also probe the global impact of protein-RNA interaction sites across cell types. Here, we present SliceIt (http://sliceit.soic.iupui.edu/), a database of in silico sgRNA (single guide RNA) library to facilitate conducting such high throughput screens. SliceIt comprises of ~4.8 million unique sgRNAs with an estimated range of 2-8 sgRNAs designed per RBP binding site, for eCLIP experiments of >100 RBPs in HepG2 and K562 cell lines from the ENCODE project. SliceIt provides a user friendly environment, developed using advanced search engine framework, Elasticsearch. It is available in both table and genome browser views facilitating the easy navigation of RBP binding sites, designed sgRNAs, exon expression levels across 53 human tissues along with prevalence of SNPs and GWAS hits on binding sites. Exon expression profiles enable examination of locus specific changes proximal to the binding sites. Users can also upload custom tracks of various file formats directly onto genome browser, to navigate additional genomic features in the genome and compare with other types of omics profiles. All the binding site-centric information is dynamically accessible via "search by gene", "search by coordinates" and "search by RBP" options and readily available to download. Validation of the sgRNA library in SliceIt was performed by selecting RBP binding sites in Lipt1 gene and designing sgRNAs. Effect of CRISPR/Cas9 perturbations on the selected binding sites in HepG2 cell line, was confirmed based on altered proximal exon expression levels using qPCR, further supporting the utility of the resource to design experiments for perturbing protein-RNA interaction networks. Thus, SliceIt provides a one-stop repertoire of guide RNA library to perturb RBP binding sites, along with several layers of functional information to design both low and high throughput CRISPR/Cas9 screens, for studying the phenotypes and diseases associated with RBP binding sites

    Dissecting Protein-RNA Interaction Network in Human Genome

    No full text
    Indiana University-Purdue University Indianapolis (IUPUI)In eukaryotes, gene regulation is a complex multilevel process comprising of transcriptional, post-transcriptional, and post-translational control. Although the regulation at transcriptional and post-translational levels is gradually being understood, protein machinery and the mechanisms underlying the post-transcriptional regulation remain to be elucidated. In the first study of this dissertation, I designed and implemented a database of RNA Binding Protein (RBP) Expression and Disease Dynamics (READ-DB: darwin.soic.iupui.edu), a non-redundant, curated database of human RBPs. This RBP knowledge base includes data from different experimental studies providing a one stop portal for understanding the expression, evolutionary trajectories, and disease dynamics of RBPs in the context of post-transcriptional regulatory networks. Despite the existence of several experimental procedures to understand the function of RBPs, a lack of a proper computational method to profile differential occupancy limits the scope of research. In the second study, I built a scalable framework for comparing genome-wide protein occupancy profiles among cell-types data, to uncover alterations in protein-RNA interactomes. diffHunter (github.com/Sasanh/diffHunter), is a window based peak calling and profile comparison method that can efficiently store the base-pair level read information of every given sample in a NoSQL (Not Only SQL) database. It identifies and quantitates the genome-wide binding differences between a pair of samples in two stages: Peak Calling and Differential Binding Identification. Identifying such regions enables us to compare the biologically important regions that differ between two conditions. Finally, I studied A-to- I RNA editing as one of the special functions of an RBPs’ family. ADAR family RBPs are the primary driver in the conversion of adenosine to inosine (A-to-I) within mRNA. I developed a Cancer-specific RNA-editing Identification using Somatic variation Pipeline (CRISP: github.com/Sasanh/CRISP) a computational framework for accurate identification of A-to-I editing events contributing to the prognosis and stratification of glioblastoma subtypes as well as the editing events that can serve as molecular classifiers for therapeutic approaches. I proposed two models that explains the cis-regulatory role of A-to-I editing events in noncoding regions in modulating the post-transcriptional regulation of target transcripts in glioblastoma.2022-08-1

    Parallel SPICi

    No full text
    In this paper, a concurrent implementation of the SPICi algorithm is proposed for clustering large-scale protein-protein interaction networks. This method is motivated by selecting a defined number of protein seed pairs and expanding multiple clusters concurrently using the selected pairs in each run; and terminates when there is no more protein node to process. This approach can cluster large PPI networks with considerable performance gain in comparison with sequential SPICi algorithm. Experiments show that this parallel approach can achieve nearly three times faster clustering time on the STRING human dataset on a system with 4-core CPU while maintaining high clustering quality

    Kidney Specific Regulatory Network in Mouse Uncovers Functional, Evolutionary and Disease Dynamics

    No full text
    Digitized for IUPUI ScholarWorks inclusion in 2021.Transcription factors (TFs) operate in a combinatorial fashion to regulate the expression of a gene or a group of genes; however, their tissue-specific regulatory interactions are not fully characterized. In this study, we construct and investigate kindey-specific regulatory (KSR) network for mouse. We obtained upstream regions of genes in the mouse genome from ENSEMBL and extracted DNase 1 Hypersensitive sites (DHS) for 8-week mouse kidney from ENCODE project. Similarly, the position weight matrices (PWMs) for TF binding motifs (BMo) were extracted from JASPAR. Jolma, TRANSFAC and mapped in the mouse genome using FIMO. These BMo were integrated with obtained DHS signals (narrow peak) in 5 KBs upstream regions. The resulting TFs and their targeted genes were modeled as directed interaction network comprising of 619 TFs and their corresponding 13500 target genes. We trimmed the resulting network by only keeping the genes that function as TFs. Resulting TF-TF network (of 619 nodes) was analyzed to provide a holistic picture of TF-TF interactions in mouse kidney tissue while the global network was studied for conservation across 61 species and relevance in kidney associated diseases. We observed that genes related to diseases were significantly enriched in second and third layers in network hierarchy. Conservation analysis of Mouse KSR revealed >50% conservation in close relatives such as rat, human, dog, squirrel and less conserved in invertebrates and yeast, thus elucidating network complexity increases with increase in kidney functionality from lower to higher species. In addition, mouse KSR was examined in its closest relative, rat for segments of nephron - TAL (Thick ascending limb), PT (Proximal tubules), IMCD (Inner medullary collecting duct), which revealed a significant enrichment of TFs for their corresponding original group in mouse KSR. Further, this network was investigated in diverse model kidney diseases such as hypertension, diabetes and kidney renal clear cell carcinoma (KIRC). The compendium of the network reported in this study can form a roadmap for increasing our understanding of the variations in regulatory wiring in kidney diseases

    Task assignment in tree-like hierarchical structures

    No full text
    Many large organizations, such as corporations, are hierarchical by nature. In hierarchical organizations, each entity, except the root, is a sub-part of another entity. In this paper, we study the task assignment problem to the entities of a tree-like hierarchical organization. The inherent tree structure introduces an interesting and challenging constraint to the standard assignment problem. Given a tree rooted at a designated node, a set of tasks, and a real-valued function denoting the weight of assigning a node to a task, the Maximum Weight Tree Matching (MWTM) problem aims at finding a maximum weight matching in such a way that no tasks are left unassigned, and none of the ancestors of an already assigned node is allowed to engage in an assignment. When a task is assigned to an entity in a hierarchical organization, the whole entity including its children becomes responsible from the execution of that particular task. In other words, if an entity has been assigned to a task, neither its descendants nor its ancestors can be assigned to any task. In the paper, we formally introduce MWTM, and prove its NP-hardness. We also propose and experimentally validate an effective heuristic solution based on iterative rounding of a linear programming relaxation for MWTM

    Health Care Needs of Underserved Populations in the City of Indianapolis

    No full text
    Meeting the health care needs for underserved populations is crucial. We used EMR data to investigate the relationship between diagnoses and patient characteristics to help providers redesign healthcare systems that can meet the needs of underserved patients
    corecore